现有的步态识别方法要么直接从原始步态序列建立全局特征表示(GFR),要么从几个本地部分生成本地特征表示(LFR)。但是,随着在更深层次的网络层中,GFR倾向于忽略人类姿势的局部细节。尽管LFR允许网络专注于每个局部区域的详细姿势信息,但它忽略了不同地方部分之间的关​​系,因此仅利用了几个特定区域的有限本地信息。为了解决这些问题,我们提出了一个名为GaitGL的基于全球的步态识别网络,以生成更具歧视性的特征表示。具体来说,开发了一个新颖的全球和局部卷积层(GLCL),以充分利用每一层中的全局视觉信息和局部区域细节。 GLCL是一种双支分支结构,由GFR提取器和基于掩模的LFR提取器组成。 GFR提取器旨在提取上下文信息,例如各个身体部位之间的关系,并提出了基于掩码的LFR提取器,以利用当地区域的详细姿势变化。此外,我们引入了一种基于面膜的新型策略,以提高局部特征提取能力。具体而言,我们设计了一对互补口罩以随机遮住特征图,然后在各种封闭的特征图上训练我们的基于面具的LFR提取器。通过这种方式,LFR提取器将学会完全利用本地信息。广泛的实验表明,步态比最先进的步态识别方法更好。 CASIA-B,OU-MVLP,增长和GAIT3D的平均排名准确性分别为93.6%,98.7%,68.0%和63.8%,明显优于竞争方法。拟议的方法在两场比赛中赢得了一等奖:HID 2020和HID 2021。
translated by 谷歌翻译
我们开发了一个结构计量模型,以捕获人类评估人员在在线微贷款平台上的决策动态,并使用现实世界数据集估算模型参数。我们在人类评估人员的决策中发现了两种类型的性别,基于偏好的偏差和基于信念的偏差的偏见。两种类型的偏见都赞成女申请人。通过反事实模拟,我们量化性别偏见对贷款授予成果和公司福利的影响和借款人。我们的结果意味着,基于偏好的偏差的存在和基于信念的偏差的存在降低了公司的利润。当删除基于偏好的偏差时,该公司获得更多利润。当基于信仰的偏差被移除时,公司的利润也增加了。既增加借款人,尤其是男性借款人的批准概率,也会增加结果,最终偿还贷款。对于借款人,消除任何一个偏差都会降低信用风险评估中真正阳性率的性别差距。我们还从反事实模拟中培训了真实数据和数据的机器学习算法。我们比较这些算法所做的决定,以了解评估者的偏差是如何由算法继承的,并反映在基于机器的决策中。我们发现机器学习算法可以减轻基于偏好的偏差和基于信念的偏差。
translated by 谷歌翻译
跨域建议可以帮助缓解传统的连续推荐系统中的数据稀疏问题。在本文中,我们提出了Recguru算法框架,以在顺序推荐中生成包含跨域的用户信息的广义用户表示,即使在两个域中的最小或没有公共用户时也是如此。我们提出了一种自我细心的AutoEncoder来导出潜在用户表示,以及域鉴别器,其旨在预测所产生的潜在表示的原点域。我们提出了一种新的逆势学习方法来训练两个模块,以使从不同域生成的用户嵌入到每个用户的单个全局Gur。学习的Gur捕获了用户的整体偏好和特征,因此可以用于增强行为数据并改进在涉及用户的任何单个域中的推荐。在两个公共交叉域推荐数据集以及从现实世界应用程序收集的大型数据集进行了广泛的实验。结果表明,Recguru提高了性能,优于各种最先进的顺序推荐和跨域推荐方法。收集的数据将被释放以促进未来的研究。
translated by 谷歌翻译
For low-level computer vision and image processing ML tasks, training on large datasets is critical for generalization. However, the standard practice of relying on real-world images primarily from the Internet comes with image quality, scalability, and privacy issues, especially in commercial contexts. To address this, we have developed a procedural synthetic data generation pipeline and dataset tailored to low-level vision tasks. Our Unreal engine-based synthetic data pipeline populates large scenes algorithmically with a combination of random 3D objects, materials, and geometric transformations. Then, we calibrate the camera noise profiles to synthesize the noisy images. From this pipeline, we generated a fully synthetic image denoising dataset (FSID) which consists of 175,000 noisy/clean image pairs. We then trained and validated a CNN-based denoising model, and demonstrated that the model trained on this synthetic data alone can achieve competitive denoising results when evaluated on real-world noisy images captured with smartphone cameras.
translated by 谷歌翻译
普通微分方程和神经网络的组合,即神经普通微分方程(神经ode),已从各个角度广泛研究。但是,在神经ode中解密的数值整合仍然是一个开放的挑战,因为许多研究表明,数值整合会显着影响模型的性能。在本文中,我们提出了反修改的微分方程(IMDE),以阐明数值整合对训练神经模型的影响。 IMDE取决于学习任务和受雇的ODE求解器。结果表明,训练神经模型实际上返回IMDE的紧密近似值,而不是真实的ode。在IMDE的帮助下,我们推断出(i)学习模型与真实颂歌之间的差异是由离散误差和学习损失的总和界定的; (ii)使用非透明数值整合的神经颂歌理论上无法学习保护定律。进行了几项实验以在数值上验证我们的理论分析。
translated by 谷歌翻译
我们提出了使用轨迹数据来学习未知无源动力学系统的音量扩展网络(VPNET)。我们提出了三个模块,并将它们组合在一起以获得两个网络体系结构,即创建的R-VPNET和LA-VPNET。所提出的模型的独特特征是它们是固有的卷积保护。另外,证明了相应的近似定理,从理论上讲,这些定理可以保证所提出的VPNET学习无源动力学的表现。数值实验证明了VP-NET的有效性,概括能力和结构保存特性。
translated by 谷歌翻译
双向反射率分配功能(BRDF)在计算机图形中使用普及,以产生逼真的基于物理的外观。近年来,利用神经网络探索的几项工作来代表BRDFS,利用神经网络的高压缩率及其适应高度复杂功能的能力。但是,一旦代表,BRDF将是固定的,因此缺乏参与后续行动的灵活性。在本文中,我们提出了一种“神经布奇代数”的形式,并同时关注BRDFS的代表和运营。我们提出了一种表示神经网络,将BRDFS压缩到潜在的矢量中,其能够准确地表示BRDFS。我们还提出了几种可以单独应用于潜伏空间的操作,例如分层和插值。通过使用潜伏向量的纹理来实现空间变化是简单的。此外,我们的代表可以有效地评估和采样,为更昂贵的蒙特卡罗分层方法提供竞争解决方案。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.
translated by 谷歌翻译